Modeling topic dependencies in semantically coherent text spans with copulas

نویسندگان

  • Georgios Balikas
  • Hesam Amoualian
  • Marianne Clausel
  • Éric Gaussier
  • Massih-Reza Amini
چکیده

The exchangeability assumption in topic models like Latent Dirichlet Allocation (LDA) often results in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. We propose copulaLDA, that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copulaLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs

We develop a fast algorithm for the Admixture of Poisson MRFs (APM) topic model [1] and propose a novel metric to directly evaluate this model. The APM topic model recently introduced by Inouye et al. [1] is the first topic model that allows for word dependencies within each topic unlike in previous topic models like LDA that assume independence between words within a topic. Research in both th...

متن کامل

Learning Summary Content Units with Topic Modeling

In the field of multi-document summarization, the Pyramid method has become an important approach for evaluating machine-generated summaries. The method is based on the manual annotation of text spans with the same meaning in a set of human model summaries. In this paper, we present an unsupervised, probabilistic topic modeling approach for automatically identifying such semantically similar te...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Grounding Topic Models with Knowledge Bases

Topic models represent latent topics as probability distributions over words which can be hard to interpret due to the lack of grounded semantics. In this paper, we propose a structured topic representation based on an entity taxonomy from a knowledge base. A probabilistic model is developed to infer both hidden topics and entities from text corpora. Each topic is equipped with a random walk ov...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016